AITopics | maximum margin classifier

Collaborating Authors

maximum margin classifier

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Risk Bounds for Over-parameterized Maximum Margin Classification on Sub-Gaussian Mixtures

Neural Information Processing SystemsApr-25-2026, 16:37:56 GMT

Modern machine learning systems such as deep neural networks are often highly over-parameterized so that they can fit the noisy training data exactly, yet they can still achieve small test errors in practice. In this paper, we study this "benign overfitting" phenomenon of the maximum margin classifier for linear classification problems. Specifically, we consider data generated from sub-Gaussian mixtures, and provide a tight risk bound for the maximum margin linear classifier in the over-parameterized setting. Our results precisely characterize the condition under which benign overfitting can occur in linear classification problems, and improve on previous work. They also have direct implications for over-parameterized logistic regression.

artificial intelligence, machine learning, population risk, (16 more...)

Neural Information Processing Systems

Country: North America > United States > California > Los Angeles County > Los Angeles (0.28)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.36)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

Add feedback

Risk Bounds for Over-parameterized Maximum Margin Classification on Sub-Gaussian Mixtures

Neural Information Processing SystemsFeb-8-2026, 11:06:48 GMT

In this paper, we study this "benign

artificial intelligence, machine learning, population risk, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.28)
North America > United States > California > San Diego County > San Diego (0.04)
North America > United States > California > San Diego County > La Jolla (0.04)
Asia > China > Hong Kong (0.04)

Genre: Research Report > New Finding (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Benign Overfitting for $\alpha$ Sub-exponential Input

Okudo, Kota, Kobayashi, Kei

arXiv.org Machine LearningSep-1-2024

This paper investigates the phenomenon of benign overfitting in binary classification problems with heavy-tailed input distributions. We extend the analysis of maximum margin classifiers to $\alpha$ sub-exponential distributions, where $\alpha \in (0,2]$, generalizing previous work that focused on sub-gaussian inputs. Our main result provides generalization error bounds for linear classifiers trained using gradient descent on unregularized logistic loss in this heavy-tailed setting. We prove that under certain conditions on the dimensionality $p$ and feature vector magnitude $\|\mu\|$, the misclassification error of the maximum margin classifier asymptotically approaches the noise level. This work contributes to the understanding of benign overfitting in more robust distribution settings and demonstrates that the phenomenon persists even with heavier-tailed inputs than previously studied.

classifier, exp, probability, (12 more...)

arXiv.org Machine Learning

2409.00733

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Mistake, Manipulation and Margin Guarantees in Online Strategic Classification

Shen, Lingqing, Ho-Nguyen, Nam, Giang-Tran, Khanh-Hung, Kılınç-Karzan, Fatma

arXiv.org Artificial IntelligenceMar-26-2024

Binary classification is a well-known problem in supervised learning, with applications in numerous important domains such as marketing, finance, natural language processing and medicine. The traditional binary classification problem aims to learn a decision rule that maps feature vectors to binary labels 1, with the aim of predicting a true underlying label for a feature vector. For example, features may correspond to identifying information of customers of a bank who apply for a loan, and in this context the label may indicate whether the bank will approve or deny the loan. The true underlying label, whether the bank should approve or deny given all future outcomes, is unknown at the time of the loan application, which necessitates the need to use a classification rule. Like the example above, binary classification is now regularly applied to various applications involving human agents. Customers obviously prefer that their loan application be approved rather than denied, and in many other applications there may similarly be one label that is preferred by agents over the other. One can imagine that in practice this leads to strategic behavior of agents, where feature vectors are manipulated in order to achieve the desired label prediction. Of course, there is often also a cost associated with manipulation, so agents may manipulate only if the payoff for achieving the desirable label is worth the cost.

algorithm 1, algorithm 3, classifier, (14 more...)

arXiv.org Artificial Intelligence

2403.18176

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > California > Los Angeles County > Los Angeles (0.14)
(9 more...)

Genre: Research Report (0.49)

Industry:

Education > Educational Setting > Online (1.00)
Banking & Finance (0.86)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Is Importance Weighting Incompatible with Interpolating Classifiers?

Wang, Ke Alexander, Chatterji, Niladri S., Haque, Saminul, Hashimoto, Tatsunori

arXiv.org Machine LearningDec-24-2021

Importance weighting is a classic technique to handle distribution shifts. However, prior work has presented strong empirical and theoretical evidence demonstrating that importance weights can have little to no effect on overparameterized neural networks. Is importance weighting truly incompatible with the training of overparameterized neural networks? Our paper answers this in the negative. We show that importance weighting fails not because of the overparameterization, but instead, as a result of using exponentially-tailed losses like the logistic or cross-entropy loss. As a remedy, we show that polynomially-tailed losses restore the effects of importance reweighting in correcting distribution shift in overparameterized models. We characterize the behavior of gradient descent on importance weighted polynomially-tailed losses with overparameterized linear models, and theoretically demonstrate the advantage of using polynomially-tailed losses in a label shift setting. Surprisingly, our theory shows that using weights that are obtained by exponentiating the classical unbiased importance weights can improve performance. Finally, we demonstrate the practical value of our analysis with neural network experiments on a subpopulation shift and a label shift dataset. When reweighted, our loss function can outperform reweighted cross-entropy by as much as 9% in test accuracy. Our loss function also gives test accuracies comparable to, or even exceeding, well-tuned state-of-the-art methods for correcting distribution shifts.

classifier, gradient descent, importance weight, (15 more...)

arXiv.org Machine Learning

2112.12986

Country:

North America > United States > California > Santa Clara County > Palo Alto (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report > Experimental Study (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.36)

Add feedback

Risk Bounds for Over-parameterized Maximum Margin Classification on Sub-Gaussian Mixtures

Cao, Yuan, Gu, Quanquan, Belkin, Mikhail

arXiv.org Machine LearningApr-28-2021

In modern machine learning, complex models such as deep neural networks have received increasing popularity. These complicated models are known to be able to fit noisy training data sets, while at the same time achieving small test errors. In fact, this benign overfitting phenomenon is not a unique feature of deep learning. Even for kernel methods and linear models, Belkin et al. (2018) demonstrated that interpolators on the noisy training data can still perform near optimally on the test data. A series of recent works (Belkin et al., 2019b; Muthukumar et al., 2020b; Hastie et al., 2019; Bartlett et al., 2020) theoretically studied how over-parameterization can achieve small population risk. In particular in Bartlett et al. (2020) the authors considered the setting where the data are generated from a ground-truth linear model with noise, and established a tight population risk bound for the minimum norm linear interpolator with a matching lower bound. More recently, Tsigler and Bartlett (2020) further studied benign overfitting in ridge regression, and established non-asymptotic generalization bounds for over-parametrized ridge regression. They showed that those bounds are tight for a range of regularization parameter values. Notably, these results cover arbitrary covariance structure of the data, and give a nice characterization of how the spectrum of the data covariance matrix affects the population risk in the over-parameterized regime.

lemma 4, population risk, theorem 3, (13 more...)

arXiv.org Machine Learning

2104.13628

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.28)
North America > United States > California > San Diego County > San Diego (0.04)
North America > United States > California > San Diego County > La Jolla (0.04)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.54)

Add feedback

Finite-sample analysis of interpolating linear classifiers in the overparameterized regime

Chatterji, Niladri S., Long, Philip M.

arXiv.org Machine LearningSep-4-2020

A surprising statistical phenomenon has emerged in modern machine learning: highly complex models can interpolate training data while still generalizing well to test data, even in the presence of label noise. This is rather striking as it the goes against the grain of the classical statistical wisdom which dictates that predictors that generalize well should trade off between the fit to the training data and the some measure of the complexity or smoothness of the predictor. Many estimators like neural networks, kernel estimators, nearest neighbour estimators, and even linear models have been shown to demonstrate this phenomenon (see, Zhang et al. 2017; Belkin et al. 2019, among others). This phenomenon has recently inspired intense theoretical research. One line of work (Soudry et al. 2018; Ji and Telgarsky 2019; Gunasekar et al. 2017; Nacson, Srebro, and Soudry 2019; Gunasekar et al. 2018a; Gunasekar et al. 2018b) formalized the argument (Neyshabur, Tomioka, and Srebro 2014; Neyshabur 2017) that, even when there is no explicit regularization that is used in training these rich models, there is nevertheless implicit regularization encoded in the choice of the optimization method used. For example, in the setting of linear classification, (Soudry et al. 2018; Ji and Telgarsky 2019; Nacson, Srebro, and Soudry 2019) show that learning a linear classifier using gradient descent on the unregularized logistic or exponential loss asymptotically leads the solution to converge to the maximum l

artificial intelligence, classifier, machine learning, (17 more...)

arXiv.org Machine Learning

2004.12019

Country:

Asia > Middle East > Jordan (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > California (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)

Add feedback

Convergence of SGD in Learning ReLU Models with Separable Data

Xu, Tengyu, Zhou, Yi, Ji, Kaiyi, Liang, Yingbin

arXiv.org Machine LearningJun-12-2018

We consider the binary classification problem in which the objective function is the exponential loss with a ReLU model, and study the convergence property of the stochastic gradient descent (SGD) algorithm on linearly separable data. We show that the gradient descent (GD) algorithm do not always learn desirable model parameters due to the nonlinear ReLU model. Then, we identify a certain condition of data samples, under which we show that SGD can learn a proper classifier with implicit bias. In specific, we establish the sub-linear convergence rate of the function value generated by SGD to global minimum. We further show that SGD actually converges in expectation to the maximum margin classifier with respect to the samples with +1 label under the ReLU model at the rate O(1/ln t). We also extend our study to the case of multi-ReLU neurons, and show that SGD converges to a certain non-linear maximum margin classifier for a class of non-linearly separable data.

artificial intelligence, machine learning, sgd, (15 more...)

arXiv.org Machine Learning

1806.04339

Country: North America > United States > Ohio (0.04)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.90)

Add feedback

Maximum margin classifier working in a set of strings

Koyano, Hitoshi, Hayashida, Morihiro, Akutsu, Tatsuya

arXiv.org Machine LearningFeb-22-2016

Numbers and numerical vectors account for a large portion of data. However, recently the amount of string data generated has increased dramatically. Consequently, classifying string data is a common problem in many fields. The most widely used approach to this problem is to convert strings into numerical vectors using string kernels and subsequently apply a support vector machine that works in a numerical vector space. However, this non-one-to-one conversion involves a loss of information and makes it impossible to evaluate, using probability theory, the generalization error of a learning machine, considering that the given data to train and test the machine are strings generated according to probability laws. In this study, we approach this classification problem by constructing a classifier that works in a set of strings. To evaluate the generalization error of such a classifier theoretically, probability theory for strings is required. Therefore, we first extend a limit theorem on the asymptotic behavior of a consensus sequence of strings, which is the counterpart of the mean of numerical vectors, as demonstrated in the probability theory on a metric space of strings developed by one of the authors and his colleague in a previous study [18]. Using the obtained result, we then demonstrate that our learning machine classifies strings in an asymptotically optimal manner. Furthermore, we demonstrate the usefulness of our machine in practical data analysis by applying it to predicting protein--protein interactions using amino acid sequences.

artificial intelligence, machine learning, maximum margin classifier

arXiv.org Machine Learning

1406.0597

Genre: Research Report (0.69)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.53)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning > Representation Of Examples (0.53)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.53)

Add feedback